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ABSTRACT 

A reachability oracle (or hop labeling) assigns each vertex v two 
sets of vertices: L out (v) and Li n (v), such that u reaches v iff 
L ut{u) n Li n (v) 7^ 0. Despite their simplicity and elegance, 
reachability oracles have failed to achieve efficiency in more than 
ten years since their introduction: the main problem is high con- 
struction cost, which stems from a set-cover framework and the 
need to materialize transitive closure. In this paper, we present two 
simple and efficient labeling algorithms, Hierarchical-Labeling and 
Distribution-Labeling, which can work on massive real- world graphs: 
their construction time is an order of magnitude faster than the set- 
cover based labeling approach, and transitive closure materializa- 
tion is not needed. On large graphs, their index sizes and their 
query performance can now beat the state-of-the-art transitive clo- 
sure compression and online search approaches. 

1. INTRODUCTION 

As one of the most fundamental graph operators, reachability 
has drawn much research interest in recent years ||5ll30j|8] |28l|17| 
[9] [61 |T6l [33] [31] [4] [29l [J5] 13 and seems to continue fascinating 
researchers with new focuses H191 132] 1181 and new variants H13I 
1101 1251 . The basic reachability query answers whether a vertex 
u can reach another vertex v using a simple path (?it — » v) in a 
directed graph. It has a wide range of applications from software 
engineering, to distributed computing, to biomedical and social net- 
work analysis, to XML and the semantic web, among others. 

The majority of the existing reachability computation approaches 
belong to either transitive closure materialization (compression) 
IU [21] [30] H7] [29) or online search J5]|28]|3l]. The transitive clo- 
sure compression approaches tend to be faster but generally have 
difficulty scaling to massive graphs due to the precomputation and/or 
memory cost. Online search is (often one or two orders of mag- 
nitude) slower but can work on large graphs ||3"T1 [T9l . The lat- 
est research |191 introduces a unified SCARAB method based on 
"reachability backbone" (similar to the highway in the transporta- 
tion network) to deal with their limitations: it can both help scale 
the transitive closure approaches and speed up online search. How- 
ever, the query performance of transitive closure approaches tends 
to be slowed down and they may still not work if the size of the 
reachability backbone remains too large 0191 . 

The reachability oracle, more commonly known as hop labeling, 
1111 1271 is an interesting third category of approaches which lie 
between transitive closure materialization and online search. Each 
vertex v is labeled with two sets: L ou t(v), which contains hops 
(vertices) v can reach; and Li n (v), which contains hops that can 
reach v. Given L out (u) and Li n (v), but nothing else, we can com- 
pute if u reaches v by determining whether there is at least a com- 
mon hop, L out (u) n Li n (v) 7^ 0. The idea is simple, elegant, and 



seems very promising: hop labeling can be considered as a factor- 
ization of the binary matrix of transitive closure; thus it should be 
able to deliver more compact indices than the transitive closure and 
also offer fast query performance. 

Unfortunately, after more than ten years since its first proposal 
fm and a list of worthy attempts J25] [8] [9] [16] 0, hop labeling 
or reachability oracle, still eludes us and still fails to meet its ex- 
pectations. Despite its appealing theoretical nature, recent studies 
1191 |29l 1101 1311 all seem to confirm its inability to handle real- 
world large graphs: hop labeling is expensive to construct, taking 
much longer time than other approaches, and can barely work on 
large graphs, due to prohibitive memory cost of the construction 
algorithm. Many studies (19 29. 10, 31] also show up to an order 
of magnitude slower query performance compared with the fastest 
transitive closure compression approaches (though we discover the 
underlying reason is mainly due to the implementation of hop la- 
beling L out and L;„ ; employing a sorted vector/array instead of a 
set can significantly eliminate the query performance gap). 

The high construction cost of the reachability oracle is inherent 
to the existing labeling algorithms and directly results in the scal- 
ability bottleneck. In order to minimize the labeling size, many 
algorithms Illl|23| [8l[l6 4] rely on a greedy set-cover procedure, 
which involves two costly operators: 1) repetitively finding densest 
subgraphs from a large number of bipartite graphs; and 2) materi- 
alization of the entire transitive closure. The latter is needed since 
each reachability pair needs to be explicitly covered by a selected 
hop. Even with concise transitive closure representation, such as 
using geometric format J8], or reducing the covered pairs using 3- 
hop 1161 |4l, the overall construction complexity is still close to or 
more than 0(n 3 ), which is still too expensive for large graphs. Al- 
ternative labeling algorithms 11271 [9) try to use graph separators, 
but only special graph classes, such as planar graphs, consisting of 
small graph separators, can adopt such techniques well 11271 . For 
general graphs, the scalability of such approach J9] is limited by 
the lack of good scalable partition algorithms for discovering graph 
separators on large graphs. 

Can the reachability oracle be practical? Is it a purely theoretical 
concept which can only work on small toy graphs, or it is a power- 
ful tool which can shape reality and can work on real-world large 
graphs with millions of vertices and edges? Arguably, this is one 
of the most important unsolved puzzles in reachability computa- 
tion. This work resolves these questions by presenting two sim- 
ple and efficient labeling algorithms, Hierarchical-Labeling and 
Distribution-Labeling, which can work on massive real-world graphs 
Their construction costs are as fast as the state-of-the-art transi- 
tive closure compression approaches, there is no expensive transi- 
tive closure materialization, dense subgraph detection, or greedy 
set-cover procedure, there is no need for graph separators, and on 



large graphs, their index sizes and their query performance beat 
the state-of-the-art transitive closure compression and online search 
approaches (17] |29] |T9j [29] [10] [3T]. Using these two algorithms, 
the power of hop labeling is finally unleashed and a fast, compact 
and scalable reachability oracle becomes a reality. 

The rest of the paper is organized as follows. In Section [2] we 
review the prior work on reachability. In Section [3] we give an 
overview of the basic ideas of constructing a fast, compact and scal- 
able reachability oracle. In Section|4] we present the Hierarchical- 
Labeling algorithm, which is based on a hierarchical decomposi- 
tion of a DAG (direct acyclic graph). In Section|5] we introduce the 
Distribution-Labeling algorithm, which utilizes a total vertex order. 
In Section[6] we report the detailed experimental study on these two 
new labeling algorithms compared with the state-of-the-art reach- 
ability computation approaches. We offer concluding remarks in 
Section|7] 

2. RELATED WORK 

To compute the reachability, the directed graph is typically trans- 
formed into a DAG (directed acyclic graph) by coalescing strongly 
connected components into vertices, avoiding the trivial case where 
vertices reach each other in a strongly connected component. The 
size of the DAG is often much smaller than that of the original 
graph and is more convenient for reachability indexing. Let G = 
(V, E) be the DAG for a reachability query, with number of ver- 
tices n = \V\ and number of edges m — \E\. 

2.1 Transitive Closure and Online Search 

There are two extremes in computing reachability. At one end, 
the entire transitive closure (TC) of G is precomputed and fully 
materialized (often in a binary matrix). Since the reachability be- 
tween any pair is recorded, reachability can be answered in constant 
time, though the 0(n 2 ) storage is prohibitive for large graphs. At 
the other end, DFS/BFS can be employed. Though it does not need 
an additional index, its query answering time is too slow for large 
graphs. As we mentioned before, the majority of the reachability 
computation approaches aim to either compress the transitive clo- 
sure (H[T4][2T|[50l[T7l[7l[29][l5] or to speed up the online search E 

IHlllIl- 
Transitive Closure Compression: This family of approaches aims 
to compress the transitive closure - each vertex u records a com- 
pact representation of TC(u), i.e., all the vertices it reaches. The 
reachability from vertex u to v is computed by checking vertex v 
against TC(u). Representative approaches include chain compres- 
sion I14l l6l. interval or tree compression J2]|2T|, dual-labeling 1301 . 
path- tree 1171 , and bit- vector compression 0291 . Using interval- 
compress as an example, any contiguous vertex segment in the orig- 
inal TC(u) is represented by an interval. For instance, if TC(u) 
is {1, 2, 3, 4, 8, 9, 10}, it can be represented as two intervals: [1, 4] 
and [8,10]. 

Existing studies |29ll31|[T9l have shown these approaches are 
the fastest in terms of query answering since checking against tran- 
sitive closure TC{u) is typically quite simple (linear scan or binary 
search suffices); in particular, the interval and path- tree approaches 
seem to be the best in terms of query answering performance. How- 
ever, the transitive closure materialization, despite compression, is 
still costly. The index size is often the reason these approaches are 
not scalable on large graphs 113 111191 - 

Fast Online Search: Instead of materializing the transitive clo- 
sure, this set of approaches [5 2S]j3TJ aims to speed up the online 
search. To achieve this, auxiliary labeling information per vertex is 
precomputed and utilized for pruning the search space. Using the 
state-of-the-art GRAIL (3T) as an example, each vertex is assigned 
multiple interval labels where each interval is computed by a ran- 



dom depth-first traversal. The interval can help determine whether 
a vertex in the search space can be immediately pruned because it 
never reaches the destination vertex v. 

The pre-computation of the auxiliary labeling information in these 
approaches is generally quite light; the index size is also small. 
Thus, these approaches can be applicable to very large graphs. 
However, the query performance is not appealing; even the state- 
of-the-art GRAIL can be easily one or two orders of magnitude 
slower than the fast interval and path-tree approaches 03111191 . For 
very large graphs, these approaches may be too slow for answering 
reachability query. 

2.2 Reachability Oracle 

The reachability oracle Hill [27), also refer to as hop labeling, 
was pioneered by Cohen et al. Ill 11 . Though it also encodes tran- 
sitive closure, it does not explicitly compress the transitive closure 
of each individual vertex independently (unlike the transitive clo- 
sure compression approaches). Here, each vertex v is labeled with 
two sets: L out (v), which contains hops (vertices) v can reach; and 
Li„(v), which contain hops that can reach v. Given L out (u) and 
Li„(v), but nothing else, we can compute if u reaches v by de- 
termining whether there is a common hop, L out (u) n Li„(v). In 
fact, a reachability oracle can be considered as a factorization of 
the binary matrix of transitive closure 0161 ; and thus more compact 
indices are expected from such a scheme. 

The seminal 2-hop labeling 1111 aims to minimize the reach- 
ability oracle size, which is the total label size ~}2(\L ou t(u)\ + 
\Li n (u)\). It employs an approximate (greedy) algorithm based 
on set-covering which can produce a reachability oracle with size 
no larger than the optimal one by a logarithmic factor. The optimal 
2-hop index size is conjectured to be 0(nm 1//2 ). 

The major problem of the 2-hop indexing approach is its high 
construction cost: The greedy set-covering algorithm needs to iter- 
ativelyfind a vertex v associated with two subsets of vertices X and 
Y which utilizes v as the intermediate hop, i.e., v £ L out (x),x £ 
X and v £ Li n (y),y £ Y. To select vertex v and its associated 
X and Y, the greedy procedure utilizes price, which measures the 
cost-benefit tradeoff between recording the vertex in L ou t(x) and 
Lin{y) (cost) and the number of reachability pairs being newly 

\X\ + \Y\ 

covered (benefit) by such labeling: \ x ^y\c\ ' wnere C are those 
reachable pairs already covered by previously selected hops. This 
selection step can be transformed into the problem of finding a 
densest subgraph in n bipartite graphs. The approximate algorithm 
to solve this subproblem is in the linear order with respect to the 
number of edges in the bipartite graph. Such an iterative approach 
can be as costly as 0(n z \TC\), where \TC\ is the total size of 
transitive closure. 

A number of approaches have sought to reduce construction cost 
through speeding up the set cover procedure 0231 , using concise 
transitive closure representation JS], or reducing the covered pairs 
using 3-hop 1161 l4l. However, they still need to repetitively find 
densest subgraphs from a large number of bipartite graphs and to 
materialize the transitive closure to explicitly confirm each reach- 
able pair is covered by the hop labeling. Alternative labeling algo- 
rithms 1271 19) try to use graph separators, but only special graph 
classes, such as planar graphs, consisting of small graph separa- 
tors, can adopt such technique well 1271 . For general graphs, the 
scalability of such approach (9j is limited by the lack of good scal- 
able partition algorithms for discovering graph separators on large 
graphs. 

2.3 Reachability Backbone and SCARAB 

In the latest study 1191 , the authors introduce a general frame- 
work, referred to as SCARAB (SCAling ReachABility), for scaling 



the existing reachability indices (including both transitive closure 
compression and hop labeling approaches) and for speeding up the 
online search approaches. The central idea is to leverage a "reach- 
ability backbone" (like highways in a road network), which carries 
the major "reachability flow" information. 

Formally, the reachability backbone G* = (V*,E*) of graph 
G is defined as a subgraph of the transitive closure of G (E* C 
TC(G)), such that for any reachable (u,v) pair, there must ex- 
ist local neighbors u* £ V* , v* £ V* with respect to locality 
threshold e, i.e., d(u,u*) < e and d(v* ,v) < e, and u* — > v*. 
Here d(u, u*) is the shortest path distance from u to u* where the 
weight of each edge is unit. To compute the reachability from u to 
v, u collects a list of local outgoing backbone vertices (entries) us- 
ing forward BFS, and v collects a list of local incoming backbone 
vertices (exits) using backward BFS. Then an existing reachabil- 
ity approach can be utilized to determine if there is a local entry 
reaching a local exit on the reachability backbone G*. 

Two algorithms are developed to approximate the minimal back- 
bone, one based on set-cover and the other based on BFS. The lat- 
ter, referred to as FastCover, is particularly efficient and effective, 
with time complexity 0(J2 v€V \N € (v)\log\N € (v)\ + \E € (v)\),v/here 
N e (v) (_E 6 (w)) is the set of vertices (edges) v can reach in e steps. 
Experiments shows that even with e, the size of the reachability 
backbone is significantly smaller than the original graph (about 
1/10 the number of vertices of the original graph). As we will 
discuss later, our first Hierarchical-Labeling algorithm is directly 
inspired by the reachability backbone and effectively utilizes it for 
reachability oracle construction. 

Though the scaling approach is quite effective for helping deal 
with large graphs, it is still constrained by the power of the original 
index approaches. For many large graphs, the reachability back- 
bone can still be too large for them to process as shown in the 
experiment study in 1191 . Also, using the reachability backbone 
slows down the query performance of the transitive closure com- 
pression and hop labeling approaches (typically two or three times 
slower than the original approaches) on the graphs where they can 
still run. In addition, theoretically, the reachability backbone could 
be applied recursively; this may further slow down query perfor- 
mance. In (19], this option is not studied. 

We also note that in 1101 , a new variant of reachability queries, 
fc-hop reachability, is introduced and studied. It asks whether ver- 
tex u can reach v within k steps. This problem can be consid- 
ered a generalization of the basic reachability, where k — oo. A 
fc-reach indexing approach is developed and the study shows that 
approach can handle basic reachability quite effectively (with com- 
parable query performance to the fastest transitive closure compres- 
sion approaches on small graphs). The fc-reach indexing approach 
is based on vertex cover (a set of vertices covers all the edges in 
the graph). To compute the reachability from u to d, each vertex 
only needs to access their immediate neighbors in the vertex cover; 
and the pairwise reachability between any two vertices in the set 
cover is precomputed and fully materialized (for basic reachability 
computation). It is easy to see that this vertex cover based ap- 
proach is a reachability backbone with e — 1 as defined in 
But this study directly materializes the transitive closure between 
any pair of vertices in the vertex cover, where in H191 , the existing 
reachability indices are used. Thus, for very large graphs where 
the vertex cover is often large, the pair-wise reachability material- 
ization is not feasible. (This observation is also confirmed through 
our experimental study in Section|6}. 

3. APPROACH OVERVIEW 

In a reachability oracle of graph G, each vertex v is labeled with 



two sets: L ou t(v), which contains hops (vertices) v can reach; and 
Lj„(c), which contain hops that can reach v. A labeling is com- 
plete if and only if for any vertex pair where u — s> v, L out (u) PI 
Li n (v) 7^ 0. The goal is to minimize the total label size, i.e., 
~}2(\L ut(u)\ + \Lin(u)\). A smaller reachability oracle not only 
help to fit the index in main memory, but also speeds up the query 
processing (with 0(\L out (u)\ + \L in (v)\) time complexity). 

As we mentioned before, though the existing set-cover based ap- 
proaches jll 23 8 16 4j can achieve approximate optimal label- 
ing size within a logarithmic factor, its computational and mem- 
ory cost is prohibitively expensive for large graphs. The labeling 
process not only needs to materialize the transitive closure, but 
it also uses an iterative set-cover procedure which repetitively in- 
vokes dense subgraph detection. The reason for such complicated 
algorithm is that the following two criteria need to be met: 1) a 
labeling must be complete, and 2) we wish the labeling to be min- 
imal. The existing approach 111 II 1161 essentially transforms the 
labeling problem into a set cover problem with the cost of con- 
structing the ground set (which is the entire transitive closure) and 
dynamic generation and selection of good candidate sets (through 
dense subgraph detection). 

To achieve efficient labeling which can work on massive graphs, 
the following issues have to appropriately handled: 

1. (Completeness without Transitive Closure): Can we guaran- 
tee labeling completeness without materialization of the transitive 
closure? Even compact QO or reduced 0161 materialization can be 
expensive for large graphs. Thus, the key is whether a labeling pro- 
cess can avoid the need to explicitly check whether a reachable pair 
(against some form of transitive closure) is covered by the existing 
labeling. 

2. (Compactness without Optimization): Without the set-cover, 
it seems difficult to produce bounded approximate optimal labeling. 
But this does not mean that a compact reachability oracle cannot be 
produced. Clearly, each vertex should not record every valid hop 
in the labeling. In the set-cover framework, a price is computed 
to determine whether a vertex should be added to certain vertex 
labels. What other criteria can help determine the importance of 
hops (vertices) so that each vertex can be more selective in what it 
records? 

In this paper, we investigate how the hierarchical structure of a 
DAG can help produce a complete and compact reachability oracle. 
The basic idea is as follows: assuming a DAG can be represented 
in a hierarchical (multi-level) structure, such that the lower-level 
reachability needs to go through upper-level (but vice versa), then 
we can somehow recursively broadcast the upper-level labels to 
lower-level labels. In other words, the labels of lower-level vertices 
(Li n and L ou ± ) can directly utilize the already computed labels in 
the upper-level. Thus, on one side, by using the hierarchical struc- 
ture, the completeness of labeling can be automatically guaranteed. 
On the other side, it provides an importance score (the level) of ev- 
ery hop; and each vertex only records those hops whose levels are 
higher than or equal to its own level. We note that there have been 
several studies |24||22in"2"ll20| |3l[T| using the hierarchical structure 
for shortest path distance computation on road networks; however, 
how to construct and utilize the hierarchical structure for reacha- 
bility computation has not been fully addressed. To the best of our 
knowledge, this is the first study to construct a fast and scalable 
reachability oracle based on hierarchical DAG decomposition. 

Now, to turn such an idea into a fast labeling algorithm for reach- 
ability oracle, the following two research questions need to be an- 
swered: 1 ) What hierarchical structure representation of a DAG can 
be used? 2) How should L ou ± and Li n be computed efficiently us- 
ing a given hierarchical structure? In this paper, we introduce two 



fast labeling algorithms based on different hierarchical structures 
of a DAG: 

Hierarchical-Labeling (Section [4}: In this approach, the hierar- 
chical structure is produced by a recursive reachability backbone 
approach, i.e., finding a reachability backbone G* from the origi- 
nal graph G and then applying the backbone extraction algorithm 
on G*. Recall that the reachability backbone is introduced by the 
latest SCARAB framework |19) which aims to scale the existing 
reachability computation approaches. Here we apply it recursively 
to provide a hierarchical DAG decomposition. Given this, a fast la- 
beling algorithm is designed to quickly compute L, n and L ou t one 
vertex by one vertex in a level-wise fashion (from higher level to 
lower level). 

Distribution-Labeling (Section[5): In this approach, the sophisti- 
cated reachability backbone hierarchy is replaced with the simplest 
hierarchy - a total order, i.e.., each vertex is assigned a unique level 
in the hierarchy structure. Given this, instead of computing Li n and 
Lout one vertex at a time, the labeling algorithm will distribute the 
hop one by one (from higher order to lower order) to Li n and L ou t 
of other vertices. The worst cast computation complexity of this 
labeling algorithm is 0(n(n + m)) (of the same order as transitive 
closure computation), though in practice it is much faster than the 
transitive closure computation. 

In the experimental study (Section[6j, through an extensive study 
on both real and synthetic graphs, we found that both labeling ap- 
proaches not only are fast (up to an order of magnitude faster than 
the best set-cover based approach Hill 1161 ) and work on massive 
graphs, but most surprisingly, their label sizes are actually smaller 
than the set-cover based approaches. 

4. HIERARCHICAL LABELING 

Before we proceed to discuss the Hierarchical Labeling approach, 
let us formally introduce the one-side reachability backbone (first 
defined in 1191 for scaling the existing reachability computation), 
which serves as the basis for hierarchical DAG decomposition and 
the labeling algorithm. 

Definition 1. (One-Side Reachability Backbone H9\l) Given 
DAG G, and local threshold e, the one-side reachability backbone 
G* = (V*,E*) is defined as follows: 1) V* C V, such that for 
any vertex pair (u, v) in G with d(u, v) = e, there is a vertex v* 
with d(u,v*) < e and d(v*,v) < e; 2) E* includes the edges 
which link vertex pair (it*, v*) in V* with d{u* , v*) < e + 1. 

Note that E* can be simplified as a transitive reduction 1191 (the 
minimal edge set preserving the reachability). Since computing 
transitive reduction is as expensive as transitive closure, rules like 
the following can be applied: (u*,v*) £ E* can be removed if 
there is another intermediate vertex x £ V* (not u* and v*) with 
d(u*,x) < e and d(x, v*) < e. 

EXAMPLE 4.1. As a simple example, let V* be a vertex cover 
of G, i.e., at least one end of an edge in E is in V* ; and let E* 
contain all edges (u*,v*) £ V* x V*, such that d(u* ,v*) < 2. 
Then, G* — (V* , E*) is one-side reachability backbone with e = 
1. In Figure\l\b), G\ is the reachability backbone of graph Go 
(FigureUia)) for e = 2. 

The important property of the one-side reachability backbone is 
that for any non-local pair (u, v): it — > v and d(u, v) > e, there 
always exists u* £ V* and v* £ V* , such that d(u,u*) < e, 
d(v* , v) < e, and u* —¥ v* . This property will serve as the key 
tool for recursively computing L out and Li„. In 1191 . the authors 
develop the FastCover algorithm employing e-step BFS for each 



vertex for discovering the one-side reachability backbone. They 
also show that when e = 2, the backbone can already be signifi- 
cantly reduced. To simplify our discussion, in this paper, we will 
focus on using the reachability backbone with e = 2 though the 
approach can be applied to other locality threshold values. 

Below, Subsection 14. 1 1 presents the hierarchical decomposition 
of a DAG and the labeling algorithm using this DAG; Subsec- 
tion |4j2] discusses the correctness of the labeling approach and its 
time complexity. 

4. 1 Hierarchical DAG Decomposition and La- 
beling Algorithm 

Let us start with the hierarchical DAG decomposition which is 
based on the reachability backbone. 

Definition 2. (Hierarchical DAG Decomposition) Given DAG 
G — (V, E), a vertex hierarchy is defined as Vo = V D Vi D 
V2 D ■ ■ ■ D Vh, with corresponding edge sets Eq, Ei, E2 — Eh, 
such that Gi = {Vi,Ei) is the (one-side) reachability backbone 
of Gi-i — (Vi-i, Ei-x), where < i < h. The final graph 
Gh ~ (Vh,Eh) is referred to as the core graph. 

Intuitively, the vertex hierarchy shows the relative importance 
of vertices in terms of reachability computation. The lower level 
reachability computation can be resolved using the higher level ver- 
tices, but not the other way around. In other words, the reachability 
(backbone) property is preserved through the vertex hierarchy. 

Q 

LEMMA 1. Assuming u 6 Vi,v 6 Vi, u reaches v in G (u — > 

Q . 

v) iff u reaches v in Gi (u — ^ v). Furthermore, for any non- 
local vertex pairs Vi) £ Vi, d(ui,Vi\Gi) > e (the distance in 
Gi), there always exists Ui+i £ Vi+i and Vi+i £ Vi+i, such that 

d(ui,ui+\\Gi) < e, d(v i+ i,Vi\Gi) < e, andui+i v i+ i. 

Proof Sketch:The first claim: assuming it £ Vi, v £ Vi, u reaches 

v in G (it — > v) iff u reaches v in Gi (u — h v), can be proved 
by induction. The base case where i — 1 is clearly true based 
on the reachability backbone definition (the reachability backbone 
will preserve the reachability between vertices in the backbone as 
they appear in the original graph). Assuming this is true for all 
i < k, then it also holds to be true for i — k. This is because 
for any u £ V%,v £ Vi, we must have u £ Vi-i and v £ Vi-i. 

Gi-i 

Based on the reachability backbone definition, we have u —— > v 

Gi — i G —G 

iff it -^-> v. Then based on the induction, we have G (u — > v) 

Q . 

iff u reaches v in Gi (u — v). The second claim directly follows 
the reachability definition. □ 

EXAMPLE 4.2. Figure\l\shows a vertex hierarchy for DAG Go 
(a), where Vi = {5, 7, 9, • • ■ , 40} (b) and V 2 = {7, 25, 35, 40} 
(c). Gi is the (one-side) reachability backbone of Go and G2 is the 
corresponding (one-side) reachability backbone ofG\. 

To utilize the hierarchical decomposition for labeling, let us fur- 
ther introduce a few notations related to the vertex hierarchy. Each 
vertex v is assigned to a unique level: level(v) = i iff v £ Vi \ 
Vj+i, where < i < h and Vh+i = 0- (Later, we will show 
that each vertex is labeled at its corresponding level using Gi and 
labels of vertices from higher levels). Assuming v is at level i, i.e., 
level(v) = i, let Na U t(v\Gi) (A^(u|Gi)) be the i>'s k-degree out- 
going (incoming) neighborhood, which includes all the vertices v 
can reach (reaching v) within k steps in Gi. Finally, for any vertex 
v at level i < h, its corresponding outgoing (incoming) backbone 
vertex set B out (v) (B|„(w)) is defined as: 
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2 : 0, 2 


2 : 5, 7, 17, 25, 35 


3 : 0, 1, 3 


3 : 3, 7, 9, 20, 25, 27, 35, 40 


4:1,4 


4:4,7, 9, 25, 27, 35,40 


6 : 0, 5, 6 


6 : 6, 7, 14, 18, 25, 29, 35, 40 


8 : 8, 9 


8 : 7. 8. 15,20,25,35,40 






38:7,25, 
34, 35, 38 


38 : 33, 38, 39, 40 



(d) Hop Labeling for Vo 



Figure 1: Running Examples of Hierarchical-Labeling 



S out (v) = {u £ Vi+i\d(v, u\Gi) < e and there is no other vertex 
x £ Vi+i, such that d(v, x\Gi) < e A d(x, u\Gi) < e(v — > x — > u)} (1) 

&in( v ) = { u G Vj+i \d(u, v\G)i) < e and there is no other vertex 
V G Vi+i, such that y\Gi) < e A u|Gj) < e(« — > y — > u)} (2) 

Now, let us see how the labeling algorithm works given the hi- 
erarchical decomposition. Contrary to the decomposition process 
which proceeds from the lower level to higher level (like peel- 
ing), the labeling performs from the higher level to the lower level. 
Specifically, it first labels the core graph Gh and then iteratively 
labels the vertex at level h — 1 to level 0. 

Labeling Core Graph Gh.: Theoretically, the diameter of the core 
graph Gh is no more than e (the pairwise distance between any 
vertex pair in Gh is no more than e), and thus no more reachability 
backbone is needed (Vh+i = 0). In this case, for a vertex v £ Vh 
(level(v) — h), the basic labeling can be as simple as follows: 

Lout(v) = iV^f 1 (v\G h ); L m {v) = (v\G h ) (3) 

The labeling is clearly complete for Gh as any reachable pair is 
within distance e. Alternatively, since the core graph is typically 
rather small, we can also employ the existing 2-hop labeling algo- 
rithm 111 1 1 1231 to perform the labeling for core graphs. Given this, 
practically, the decomposition can be stopped when the vertex set 
Vh is small enough (typically less than 10K ) instead of making its 
diameter less than or equal to e. 

Labeling Vertices with Lower Level i (0 < i < h): After the core 
graph is labeled, the remaining vertices will be labeled in a level- 
wise fashion from higher level h — 1 to lower level (until level 0). 
For each vertex v at level < i < h, assuming all vertices in 
the higher level (> i) have been labeled (L out and Li n ), then the 
following simple rule can be utilized for labeling v: 

W) = ^I^HGOlU U L out (u)) (4) 

u£B' out (v\Gi) 

L in (v) = Al £/21 HG S )U( (J L out (u)) (5) 

Basically, the label of L out (v) (Li n (v)) at level i consists of two 
parts: the outgoing (incoming) [e/2] -degree neighbors of v in Gi, 
and the labels from its corresponding outgoing (incoming) back- 
bone vertex set W out {v\Gi) (Bf n (v\Gi)). In particular, if e = 2 



(the typical locality threshold), then each vertex v basically records 
its direct outgoing (incoming) neighbors in Gi and the labels from 
its backbone vertex set. 

Algorithm 1 Hierachical-Labeling(G = (V, E)) 
1: Perform Hierarchical Decomposition of G based on Defini- 
tion|2] 

Labeling core graph Gh \ 

% <— h — 1; 

while i > {Labeling V» from higher level to lower} do 
for each v £ V% \ Vi+i {labeling each vertex specific for 
Vi}do 

■ iV o r :{ 21 (v\Gi) U ({J ueBlutMGi) L out (u)) 
NL /2 \v\G t )U([J ueB r MGt) L out (u)) 



Lout(v) 
L in (v) < 

end for 

i <— i — 1; 
end while 



Overall Algorithm: Algorithm[T]sketches the complete Hierarchical- 
Labeling approach. Basically, we first perform the recursive hierar- 
chical DAG decomposition (Line 1). Then, the vertices at the core 
graph Gh will be labeled either by Formula[3]or using the existing 
2-hop labeling approach (Line 2). Finally, the while-loop performs 
the labeling from higher level h — 1 to lower level iteratively 
(Lines 4-10), where each vertex v in the level i (Lines 5 — 9) will 
be labelled based on Formulas |4]and [5] 

EXAMPLE 4.3. Figure\l\iUustrates the Hierarchical-Labeling 
process, where Figure\l\c) shows the labeling of core graphs. Note 
that for simplicity, each vertex by default records itself in both L% n 
and L ou t, and e — 2. Figure\l\b) shows the labeling for ver- 
tices in Vi; and Table\l\c) illustrates the labeling of a few vertices 
in Vo- Taking vertex 14 for example: Li„(14) records its direct 
incoming neighbors in Gi {7, 14} (and itself), and other labels 
from the labels of its corresponding incoming backbone vertex set 
Bf„(14|Gi) = {7}. Thus, L in (U) = {7,14}. Now L wt (14) 
records its direct outgoing neighbors {14, 29} and L ou t of vertex 
40 (B5 n( (14|Gi) = {40}). 

4.2 Algorithm Correctness and Complexity 

In the following, we first prove the correctness of the Hierarchical- 
Labeling algorithm, that is, that it produces a complete labeling: for 



any vertex pair (u, v), u — ¥ v iff L out (u) n Li n (v) 7^ 0. We then 
discuss its time complexity. 

THEOREM 1. The Hierarchical -Labeling approach (Algorithm\T} 
produces a complete labeling for each vertex v in graph G, such 
that for any vertex pair (u, v): u — > v iff L ou t(u) f] Li„(v) 7^ 0. 

Proof Sketch:We prove the correctness through induction: assum- 
ing Algorithm[T]produces the correct labeling for Vi+i, then it pro- 
duces the correct labeling for Vi. Basically if for any vertex pair u* 
and i;* in Vi+i, u* — > v* iff L ou t(u*) n Li„(v*) 7^ 0, then we 
would like to show that for any vertex pair u and v in Vi, this also 
holds. To prove this, we consider four different cases for any u and 
v in Vi+i: 1) u £ V \ V+i and v € Vi \ V i+1 ; 2) u £ V; \ V i+ i 
and v £ Vi + u 3) u £ Vi +1 and v 6 Vi \ Vi +1 ; and 4) it £ V i+ \ 
and v £ Vi+i. Since case 4 trivially holds based on the reduction 
and cases 2 and 3 are symmetric, we will focus on proving cases 1 
and 2. 

Case 1 (u 6 Vi \ Vi+i and v eVi\ V;+i): We observe: 1) u — *• v 
with d(u, v) < e (local pair) iff there is x £ Vi, such that d(u, x) < 

r§l a ndd(x,v) < rfl,i.e.,iVj:/ 21 («|G i )niV j r n e/21 («|G j )#0; 

and 2) u — > u with d(u, v) > e (non-local pair) 
iff there are backbone vertices u* , v* £ Vi+i, such that d(u, u*) < 
e, d(v*, v) < e and ti* — > u*. That is, L out (u*) n Li n (v*) 7^ 
iff there are x £ £5o Ut (u|Gi) and y £ Bi„(u|Gi), such that x — > 
y, i.e., L out (x) n L in (y) ^ (if there is x £ Vj+i, such that 
x) < e and rf(x, u*) < e, then we can always use x to replace 
u* for the above claim; (it* — > v* then x — > v*)) 

iff (IUb^hg,) ^tW) n (U„ e sf„(«|G i ) ^'("l) / 0- 
Case 2 (u £ Vj \ Vi+i and v £ K+i): We observe I) u —} v with 
d(w, v) < e (local pair) iff either v £ 3% ut { u \Gi) (v £ L ou t(u) 
and w £ Li n (u)), or there is x £ Bo Ut (t>|Gj), such that x — > w, 
i.e. L ou t(x) n Lin(-y) / 

iff (lUs» utHGl ) n (IUb^hg,) # 0; and 

2) m — !> u with w) > e (non-local pair) iff there exists x such 
that x £ SJ ut (u|Gi) andx -¥ v, i.e. L out (x) n -Lin(w) / 

iff (U ue8 | ut („| Gl ) L ™t(w)) n (U« eB j n («|G 4 ) L °"t(")) / 0- 

Thus, in all cases, we have the con'ect labeling for any vertex pair 
u and v in Vi+i. Now, the core labeling is correct either based on 
the basic case where the graph diameter is no more than e or based 
on the existing 2-hop labeling approaches 111 111231 . Together with 
the above induction rule, we have for any vertex pair in V — Vo, 
the label is complete and we thus prove the claim. □ 
Complexity Analysis: The computational complexity of Algo- 
rithm [T] comes from three components: 1) the hierarchical DAG 
decomposition, 2) the core graph labeling, and 3) the remaining 
vertex labeling for levels from h — 1 to 0. For the first compo- 
nent, as we mentioned earlier, we can employ the FastCover algo- 
rithm 1191 iteratively to extract the reachability backbone vertices 
Vi and their corresponding graph Gi . The FastCover algorithm is 
very efficient and to extract G;+i from Gi, it just needs to traverse 
the e neighbors of each vertex in G;+i. Its complexity is 0(~}2 ev 

|^t(w|G 4 )|i«^|JV^ t (w|G < )| + |E| trt (w|G < )l). where EluMGi) 
is the set of edges v can reach in e steps. Also, we note that in 
practice, the vertex set Vi shrinks very quickly and after a few 
iterations (5 or 6 typically for e = 2), the number of backbone 
vertices is on the order of thousands (Section |6}. We can also 
limit the total number of iterations, such as bounding h to be 10 
and/or stop the decomposition when the Vi is smaller than some 
limit such as 10K. For the second component, if the diameter 
is smaller than e and Formula [3] is employed, it also has a linear 
cost: (\N^ t (v\G h )\ + |^(w|G k )| + \NL(v\G h )\ + 

I E\ n (v I Gh) I ) ) . If we employ the existing 2-hop labeling approach 1111 



1231 . the cost can beO(|Vk| 4 ). However, since \ Vh \ is rather small, 
the cost can be acceptable and in practice (Section|6j, it is also quite 
efficient. Finally, the cost to assign labels for all the remaining ver- 
tices is linear to their neighborhood cardinality and the labeling size 
of each vertex. It can be written as 0(^2 ve y\v- 1 (\-^out( v \Gh)\ + 
\E c ut(v I G h )\ + \ Nf n (v I G h )\+jE! n (v\Gh)\) TML, where M is 
the maximal number of vertices in the backbone vertex set and L is 
the maximal number of vertices in any Lm or L ou t. 

We note that for large graphs, the last component typically domi- 
nates the total computational cost as we need to perform list merge 
(set-union) operations to generate L ou t and Li n for each vertex. 
However, compared with the existing hop labeling approach, 
Hierarchical-Labeling is significantly cheaper as there is no need 
for materializing transitive closure and the set-cover algorithm. The 
experimental study (Section^ finds that the labeling size produced 
by the Hierarchical-Labeling approach is comparable to that pro- 
duced by the expensive set-cover based optimization. 

5. DISTRIBUTION LABELING 

The Hierarchical-Labeling approach provides a fast alternative 
to produce a complete reachability oracle. Its labeling is dependent 
on a reachability-based hierarchical decomposition and follows a 
process similar to the classical transitive closure computation [26], 
where the transitive closure of all incoming neighbors are merged 
to produce the new transitive closure. However, the potential is- 
sue is that when merging L ou t and Li„ of higher level vertices for 
the lower level vertices, this approach does not (and cannot) check 
whether any hop is redundant, i.e., their removal can still produce a 
complete labeling. Given the current framework, it is hard to eval- 
uate the importance of each individual hop as they being cascaded 
into lower level vertices. Recall that for a vertex v, when comput- 
ing its L out (v) and Li n (v), its corresponding backbone vertex sets 
(t3out( v ) an d only eliminate those redundant backbones 

if they can be linked through a local vertex (Formulas [Tj and |2}. 
Thus even if u £ 13 e out (v), it may still be redundant as there is 
another vertex u £ W out {v) such that v! — > u (but d(u',u) is 
large). However, this issue is related to the difficulty of computing 
transitive reduction as mentioned earlier. 

In light of these issues, we ponder the following: Can we per- 
form labeling without the recursive hierarchical decomposition? 
Can we explicitly confirm the "power" or "importance" of an in- 
dividual hop as it is being added into L out and Lj n ? In this work, 
we provide positive answers to these questions and along the way, 
we discover a simple, fast, and elegant labeling algorithm, referred 
to as Distribution-Labeling: 1) the recursive hierarchical decom- 
position is replaced with a simple total order of vertices (the order 
criterion can be as simple as a basic function of vertex degree); 
2) each hop is explicitly verified to be added into L ou t and Li„ 
only when it can cover some additional reachable pairs, i.e., it is 
non-redundant. Surprisingly, the labeling size produced by this ap- 
proach is even smaller than the set-cover approach on all the avail- 
able benchmarking graphs used in the recent reachability studies 
(Section|6j. 

In Subsection 15. 11 we first introduce a simple yet fundamental 
observation of hop-covering (given a hop, what vertex pairs can it 
cover), which is the basis for the Distribution-Labeling algorithm; 
and Subsection |5.2| we present the labeling algorithm and discuss 
its properties. 

5.1 Hop Coverage and Labeling Basis 

We first formally define the "covering power" of a hop and then 
study the relationship of two vertices in terms of their "covering 
power". 




(a) Labeling for Cov (13) (b) Labeling for Cov({13, 7}) 

Figure 2: Running Example 

Definition 3. (Hop Coverage) For vertex v, its coverage Cov(v) 
is defined as TC~ (v) X TC(v) = {(u,w) : u — > v and v — > w}. 
Note that TC~ l (v) is the reverse transitive closure of v which 
includes all the vertices reaching v. If for any pair in (it, w) G 
Cov(v), L out (u) fl Li n (w) / 0, then we say Cov(v) is covered 
by the labeling. We also say Cov(v) can be covered by v if each 
vertex u reaching v (u G TC~ 1 (v)j has v G L ou t(u) and each 
vertex w being reached by v has v G Li n (w) (w G TC(v)). 

Given this, the labeling L out and Li n is complete if it covers 
Cov(V) — U ve vCov(v), i.e., for any (u, w) G Cov(V), 
To achieve a complete labeling, let us start with Cov(v,v') = 
Cov(v) U Cov(v'). We study how to use only v and v' to cover 
Cov(v,v'). Specifically, we consider the following question: as- 
suming v has been recorded by L ou t(u) for every u G TC -1 ^) 
and by Li n (w) for every w G TC(v), then in order to cover the 
reachability pairs in Cov(v, v') and only v' can serve as the hop, 
what vertices should record v' in their L ou t and Li n ? 

To answer this question, we consider three cases: 1) v and v' 
are incomparable, i.e., v ->■» v and v </- v ; 2)v' v; and 3) 
v —¥v' . For the first case, the labeling is straightforward: each u G 
TC _1 (?;') needs to record v' G L out (u) and each w G TC(v') 
needs to record v' G Li n (u). Note that in the worst case, this is 
needed in order to recover pairs as TC^fV) x {«'} and {v'} x 
TC(v'). For Cases 2 and 3, Lemma|2]provides the answer. 

LEMMA 2. Let L in (u) = {v} for every u G TC -1 (i;) and 
L ut(w) = {v} for every w G TC(v). If v — > v, then with 
L ut(u) = {v,v'}foru G TC _1 (d') and L in (w) = {v 1 } for 
w G TC(v')\TC(v) (other labels remain the same), C'ov({v,v'}) 
is covered (using only hops v and v' ). Ifv — > v', then with L ou t(u) = 
{v'} foru G TC-^v') \ TC'- 1 (v) and L m (w) = {v,v'}for 
w G TC(v') (other labels remain the same), Cov({v, v'}) is cov- 
ered (using only hops v and v' ). 

Proof Sketch:We will focus on proving the case where v' — !> v as 
the case v' —¥ v is symmetric. We first note that if v' — > v, then 

TC-^v) C TC-^v') and TC(v') D TC(v). Since Cov(v) = 
TC—l(v) x TC(v) is already covered by v, the uncovered pairs 
in Cov({v, v'}) can be written as 

Cov({v,v'})\Cov(v) ^TC' 1 ^') x (TC(v')\TC(v)) 
Given this, adding v' to L ou t(u) where u G TC _1 (u') and to 




(c) Labeling for Cov({13, 7, 25}) (d) Basic Labeling 

of Distribution-Labeling 

Li n (w) where w G TC(v') \ TC(v) can thus cover all the pairs 
in Cov({v,v'}). a 



EXAMPLE 5.1. Figure\2\a) shows the labeling for Cov(13) and 
Figure\2j[b) shows that for Cov(13, 7) where 7 — > 13. In partic- 
ular, TC-\13) = TC- 1 (7) U {11} andTC(13) C TC(7). 
For all u G TC^ 1 (7), we have L out (u) = {7, 13} and for all 
w G L in (7) \ Li„(13), we have L in (w) = {7}. 

Given Lemma[2] we consider the following general scenario: for 
a subset of hops V s C V, assume L ou t and Li n are correctly la- 
beled using only hops in V s to cover Cov(V s ). Now how can we 
cover Cov(V s U {v'}) by adding the only additional hop v to Li„ 
and L out 7 The following theorem provides the answer (Lemma [2] 
can be considered a special case): 

Theorem 2. (Basic Labeling) Given a subset of hops V s c 
V, let L ou t(u) C Vs and Li n (u) Q V s be complete for covering 
Cov(V a ), i.e., for any (u,v) G Cov(V s ), L out (u) n L in (v) / 0. 
To cover Cov(V s U {«'}) using additional hop v', the following 
labeling is complete: 

L out (u) <- L out (u) U {v'},u £ TC- 1 ^') \ TC~\X) (6) 
L in (w) <- L in (w) U {v'}, w G TC(v') \ TC(Y) (7) 

where X = TC~ l (v') n V s including all the vertices in V s reach- 
ing v' and Y = TC(v') n Vs including all the vertices in V a 
that can be reached by v' ; TC' 1 (X) = \J ex TC' 1 (v) and 
TC(Y) = \J veY TC(v). 

The theorem and its proof can be illustrated in Figure |2jd). 

Proof Sketch:We first observe the following relationships be- 
tween the (reverse) transitive closure of v' and X, Y. 

TC _1 (i.') D TC' 1 (X); TC(v') C TC(v),v G X; 

TC(v') D TC(Y); TC' 1 ^') C TC~ L (v),v G Y; 

Thus, following the similar proof of Lemma|2] we can see that 

Cov(V a U {«'}) = Cov(V 3 ) UTC-'fi/) x TC(v') 
= Cov(V s ) U (TC- 1 (v')\TC- 1 (X))UTC- 1 (X)) 
x((TC(v')\TC(Y)) UTC(Y)) 



= Cov{V s ) U (TC" 1 ^') \ TC~ 1 (X)) x (TC{v') \ TC{Y)) 

\J{TC- 1 {v')\TC- 1 (X)) x \J TC(v') 
veY 

UTC~ 1 (X) x (TC(v')\TC{Y)) 
UTO' 1 (X) x TC(Y) 

= Cov(V s ) U (TC* -1 ^') \ TC 1 (X)) x (TC(«') \ TC*(Y)), 
since (TC*- 1 ^') \TC*" 1 (X)) X TC(Y) C Ccw(Vi); 
TC _1 (X) x (TC(i/) \TC(Y)) C Cod(F s ); 

TC _1 (X) x TC(Y) C Ccw(V s ) 

Thus, by adding v' to L out {u),u G TC _1 (u') \ TC _1 (X) and 
to Lin(w),w G TC{v') \ TC{Y), the labeling will be complete 
to cover Cov(V s U {V}). □ 

EXAMPLE 5.2. Figure\2\c) shows an example of Ccw({13, 7}U 
{25}), where X — {13, 7} (both can reach 25 and Y = 0. T/iwi 
25 is added to L ou t{u),u G TC _1 (25) \ {TC {13) U TC(7)) and 
to L in {w),w G TC(25). 

5.2 Distribution-Labeling Algorithm 

In the following, base on Lemma [2] and Theorem [2] we intro- 
duce the Distribution-Labeling algorithm, which will iteratively 
distribute each vertex v to L out and Li n of other vertices to cover 
Cov{V s U {v}) (V a includes processed vertices). Intuitively, it first 
selects a vertex vi and provides complete labeling for Cov{vi); 
then it selects the next vertex V2, provides complete labeling for 
Cov{{vi, V2}) based on Lemma|2] It continues this process, at 
each iteration i selecting a new vertex Vi and producing the com- 
plete labeling for Cov{V 3 U {vi}) based on Theorem [2] where V a 
includes all the i — 1 vertices which have been processed. The 
complete labeling will be produced when V s = V . 

Given this, two issues needs to be resolved for this labeling pro- 
cess: 1) What should be the order in selecting vertices, and 2) How 
can we quickly compute X (processed vertices which can reach 
the current vertex Vi) and Y (processed vertices Vi can reach), and 
identify u G TCT 1 ^) \ TC' 1 {X) and w G TC{vt) \ TC{Y). 
Vertex Order: The vertex order can be considered an extreme hi- 
erarchical decomposition, where each level contains only one ver- 
tex. Furthermore, the higher level the vertex, then the more im- 
portant it is, the earlier it will be selected for covering, and the 
more vertices that are likely to record it in their L out and Li„ lists. 
There are many approaches for determining the vertex order. For 
instance, if following the set-cover framework, the vertex can be 
dynamically selected to be the cheapest in covering new pairs, i.e., 

|TC ' 1( rr X ^u/ X |nr C r ( v^ TC(y) ' - However, this is computa- 

\Cov( V s U\Vi })\Cov{Vs ) I r 

tionally expensive. We may also use \Cov{vi) | which measures the 
covering power of vertex v, but this still needs to compute transi- 
tive closure. In this study, we found the following rank function, 
{\N ou t{v)\ + 1) x ( I Ni n {v) I + 1), which measures the vertex pairs 
with distance no more than 2 being covered by v, is a good candi- 
date and can provides compact labeling. Indeed, we note a similar 
criterion actually used in 1191 for selecting reachability backbone 
as well. 

Labeling L out and Lj n : Given vertex Vi, we need to find (1) 
u G rC _1 (Di) \ TC^ 1 {X), i.e., the vertices reaching v t but not 
reaching by v such that v — !> Vi and it has a higher order (already 
being processed); and (2) w G TC'{vi) \ TC{Y), i.e., the vertices 
which can be reached by Vi but cannot be reached by v such that 
Vi — > v and it has a higher order. The straightforward way for 
solving (1) is to perform a reversed traversal and visit (expand) the 



vertices based on the reversed topological order; then once the vis- 
ited vertex has a higher order then Vi, all its descendents (including 
itself) will be colored (flagged) to be be excluded from adding Vi to 
L ou t\ thus Vi will be added to L ou t for all uncolored vertices dur- 
ing the reverse traveral process. A similar ordered traversal process 
can be used for solving (2). However, the (reverse) ordered traver- 
sal needs a priority queue which results in 0{\ V\ log |V| + \E\) 
complexity at each iteration. In this work, we utilize a more effi- 
cient approach can effectively prune the traversal space and avoid 
the priority queue, which is illustrated in Algorithm [2] 

Algorithm 2 Distribution-Labeling(G=(V,E)) 
1 : Rank vertices in G in certain order; 
2: for each Vi G V {from higher order to lower} do 
3: Perform Reverse BFS starting from Vi, and for each vertex 

u being visited: 
4: if L out {u) n L in {vi) then 
5: Do not add Vi to L ou t{u) nor expand u\ 
6: else 

7: Add Vi into L ou t{u) and expand u in the reverse BFS; 
end if 

Perform BFS starting from Vi, and for each vertex w being 
visited: 

10: ftL in {w) nl ra ,(«i) ^ 0then 

11: Do not add Vi to Li n {u) nor expand w; 

12: else 

13: Add Vi into Li n {w) and expand w in the BFS; 
14: end if 
15: end for 

In Algorithm [2] the iteration labeling process is sketched in the 
foreach loop (Lines 2 to 15). The main procedure in comput- 
ing u G TC _1 (wi) \ TC~ 1 {X) for labeling L out is outlined in 
Lines 3 — 8. The main idea is that when visiting a vertex u, once 
L ut{u) n Li n {vi) is no longer empty, we can simply exclude u 
and its descendents from consideration, i.e., u G TC^ 1 {X) (Lines 
4 — 6). Intuitively, this is because there exists a vertex v, such that 
u — > v — > Vi and has order higher than Vi . Similarly, the procedure 
that computes w G TC{vt) \ TC{Y) for labeling Li„ is outlined 
in Lines 9 — 14. Here, the condition Li n {w) n L out {vi) 7^ is 
utilized to prune w and its descendents to determine Li n labeling. 
Figure [2] illustrates the labeling process based on Algorithm [2] for 
the first three vertices 13, 7, and 25. 

5.3 Completeness, Compactness, and Complex- 
ity 

In the following, we discuss the labeling completeness (correct- 
ness), compactness (non-redundancy), and time complexity. 

THEOREM 3. (Completenss) The Distribution-Labeling algo- 
rithm (Algorithm [2j produces a complete L ou t and Li„ labeling, 
i.e., for any vertex pair {u, v), u — > v iff L ou t {u) n Li„ {v) 7^ 0. 

Proof Sketch^ G TCT 1 ^) \ TC^{X) and 2) w G TC{vi) \ 
TC(Y). They are symmetric and we will focus on 1). Note that 
forw G TC- 1 {v l )\TC~ 1 {X), we need to exclude vertex u such 
that v! — > v — ?> Vi, where v is already processed (has higher order 
than Vi). Assuming the labeling is complete for Cov{V s ), where 
Vs = {«!,••• ,d,-i}, then L out {u) n L in {v % ) ± (Line 4). If 
u' should be excluded, then its descendents from the BFS traversal 
will also be true and should also be excluded. Furthermore, the re- 
verse BFS can visit all vertices where this condition does not hold, 
U.,L out {u)nL out {vi) = 0, and thus u G TC~ 1 {v l )\TC- 1 {X). 
□ 



Theorem[3]shows that the Distribution-Labeling algorithm is cor- 
rect; but how compact is the labeling? The following theorem 
shows an interesting non-redundant property of the produced label- 
ing, i.e., no hop can be removed from Li„ or L ou ± while preserving 
completeness. We note that this property has not been investigated 
before in the existing studies on reachability oracle and hop label- 

ing GU|S1E@[T5|E|. 

THEOREM 4. (Non-Redundancy) 77z<? Distribution-Labeling al- 
gorithm (Algorithm^ produces a non-redundant L ou t and Li n la- 
beling, i.e., if any hop h is removed from a L ou t or Li„ label set, 
then the labeling becomes incomplete. 

Proof Sketch:We will show that 1) for any u € TC' 1 ^) \ 
TC^ 1 (X), Vi cannot be removed from L ou t\ and 2) for any w £ 
TC(i)i)\TC(Y), Vi cannot be removed from Li n . Note that when 
Vi is being added to L ou t(u) and Li n (w), it is non-redundant as the 
new labeling at least covers (TC -1 (d,) \ TC^ 1 (X)) x {vt} and 
{vi}xTC(vi)\TC(Y). 

However, will any later processed vertex Vj, such that i < j, 
make Vi redundant? The answer is no because in this case (still 
focusing on the above covered pairs by Vi), u — > Vj Vi (or 
w <— Vj 4— Vi), but the order of Vi is higher than Vj and Vj will not 
be added Vi into its L out or Li„. In other words, for any vertex pair 
in (TC- 1 («»)\TC , - 1 (X)) x {«<} or { Vi } x TC(vi) \TC(Y), v 4 
is the only hop linking these pairs, i.e., L ou t{u) H Li n {vi) = {vi} 
and L out (vi) n L out (u) — {vi}. Thus, Vi is non-redundant for all 
the vertices recording it as label, i.e., L ou t(u),u 6 TC _1 (wi) \ 
TC^iX) and L in (to), w G TC{vi) \ TC(Y). □ 

As we discussed earlier, Hierarchical-Labeling does not have this 
property; we can see this through counter-examples. For instance, 
in FigureQJb), 17 is redundant for L ou t(5). However, to remove 
these cases, the transitive reduction would have to be performed, 
which is expensive. Furthermore, whether the labels produced by 
the existing set-cover based approach Hill are redundant or not re- 
mains an open question though we conjecture they might be redun- 
dant. 

Time Complexity: The worst case computational complexity of 
Algorithm|5]can be written as 0(|V|(|V| + \E\)L), where L is the 
maximal labeling size. However, the conditions in Line 4 and 10 
can significantly prune the search space, and L is typically rather 
small, the Distribution-Labeling can perform labeling very effi- 
ciently. In the experimental study (Section|6), we will show Algo- 
rithm's on average more than an order of magnitude faster than 
the existing hop labeling and has comparable or faster labeling time 
than the state-of-the-art reachability indexing approaches on large 
graphs. Its labeling size is also small and surprisingly, even smaller 
than the greedy set-cover based labeling approaches in most of the 
cases. This may be an evidence that the labeling of the existing 
set-cover based approach 111 11 is redundant. 

6. EXPERIMENTAL EVALUATION 

In this section, we empirically evaluate the Hierarchical -Labeling 
and Distribution-Labeling labeling algorithms against the state-of- 
the-art reachability computation approaches on a range of real graphs 
which have been widely used for studying reachability 131||29|[T5l 
11911 101 - In particular, we are interested in the following questions ( 
in terms of the query efficiency, construction cost, and index (label- 
ing) size) : 1) how do the reachability oracle approaches perform 
compared with the transitive closure compression and online search 
approaches? 2) how do these two approaches perform compared 
with the existing 2-hop approaches assuming the later one can com- 
plete the labeling? 3) How do these two methods (Hierarchical- 
Labeling and Distribution-Labeling) compare with one another? 



Small Real Graph 


Large Real Graph 


Dataset 


IVI 
1 v 1 


IB 


Dataset 


1 VI 
1 v 


\E\ 


agrocyc 


12684 


13408 


citeseer 


693,947 


312,282 




3710 


3600 


go_uniprot 


6,967,956 


34,770,235 


anthra 


12499 


13104 


mapped. 100K 


2,658,702 


2,660,628 




12620 


13350 


mapped. 1 M 


9,387,448 


9,440,404 


hpycyc 


4771 


5859 


uniprotenc_22m 


1,595,443 


1,595,442 


human 


38811 


39576 


uniprotenc. 1 00m 


16,087,294 


16,087,293 


kegg 


3617 


3908 


uniprotenc_150m 


25,037,599 


25,037,598 


mtbrv 


9602 


10245 


citeseerx 


6,540,399 


15,011,259 


nasa 


5605 


7735 


cit-Patents 


3,774,768 


16,518,947 


reactome 


901 


846 








vchocyc 


9491 


10143 








xmark 


6080 


7028 









Table 1: Real datasets 



6.1 Experimental Setup 

To answer these questions, we evaluate the Hierarchical-Labeling 
(HL) and Distribution-Labeling (DL) labeling algorithms against 
the state-of-the-art reachability computation approaches: 

1) PathTree [171 , an improved version of Agrawal's tree-interval 
method (2); 

2) Nuutila 's Interval 1211 . a transitive closure compression method, 
recently demonstrated to be one of the fastest reachability compu- 
tation methods 1291 ; 

3) PAWH-8 [291, the latest bit- vector compression method for tran- 
sitive closure compression) 1291 and PWAH-8 is its best variant [j29]. 

4) K-Reach 1101 , a latest vertex-cover based approach for general 
reachability computation, i.e., determine whether two vertices are 
within distance k. Here k is set to be the total number of vertices 
in the graph for the basic reachability. 

5) GRAIL 1311 . a scalable reachability indexing approach using ran- 
dom DFS labeling (the number of intervals is set at 5, as suggested 
by authors). 

6) 2HOP 1111 , Cohen et al.'s 2-hop labeling approach; 

Here, Path-Tree (1), Interval (2), and PAWH-8(3) are the state- 
of-the-art transitive closure compression approaches; K-Reach (4) 
is the latest general reachability approach and has been shown to be 
very capable in dealing with basic reachability 1101 (it can also be 
considered as transitive closure compression as it materializes the 
transitive closure for the vertex-cover, a subset of vertices); GRAIL 
(5) is the state-of-the-art online search approach; and 2HOP (6) is 
the existing set-cover based hop labeling approach. In addition, we 
also include the latest SCARAB method 1191 for scaling PathTree 
and speeding up GRAIL, referred to as PATH-TREE" and GRAIL*, 
respectively. The locality parameter e is set at 2 for SCARAB. 

All the methods (including source code) except 2HOP are either 
downloaded from authors' websites or provided by the authors di- 
rectly. We have implemented 2HOP, Hierarchical-Labeling (HL), 
Distribution-Labeling (DL), and 2HOP has been improved with 
several fast heuristics |23l 1161 to speed up its construction time. 
All these algorithms are implemented in C++ based on the Stan- 
dard Template Library (STL). 

In the experiments, we focus on reporting the three key mea- 
sures for reachability computation: query time, construction time, 
and index size. For the query time, similar to the latest SCARAB 
work 1191 , both equal and random reachability query workload are 
used. The equal query workload has about 50% positive (reachable 
pairs) and about 50% negative (unreachable pairs) queries. Positive 
queries are generated by sampling the transitive closure. Also the 
query time is the running time of a total of 100, 000 reachability 
queries. 

All experiments are performed on a Linux 2.6.32 machine with 
Intel Xeon 2.67GHz CPU and 32GB RAM. 

6.2 Experimental Results 



Dataset 


GRAIL 


GRAIL* 


PATH-TREE 


PATH-TREE* 


K-REACH 


PWAH-8 


INTERVAL 


:h< ii ' 


HL 


DL 


agrocyc 


189.84 


55.01 


1.11 


6.20 


1.54 


7.86 


2.59 


3.78 


4.30 


2.11 


amaze 


343.47 


18.43 


1.16 


12.64 


1.44 


3.55 


3.08 


3.03 


2.95 


2.22 


anthra 


124.24 


43.86 


1.28 


6.51 


1.48 


7.74 


2.58 


3.79 


3.93 


2.11 


ecoo 


122.11 


55.45 


1.10 


6.29 


1.50 


7.69 


2.71 


3.86 


4.41 


2.14 


hpycyc 


87.76 


15.30 


1.06 


12.00 


1.45 


8.62 


1.47 


3.83 


4.03 


2.29 


human 


185.38 


68.54 


1.16 


6.84 


1.78 


4.45 


3.23 


3.61 


2.50 


2.30 


kegg 


272.12 


26.90 


1.19 


13.10 


1.54 


4.75 


2.59 


3.18 


3.44 


2.38 


mtbrv 


115.41 


49.22 


1.06 


6.44 


1.47 


7.20 


2.59 


3.95 


5.14 


2.10 


nasa 


135.78 


32.41 


1.37 


14.44 


2.16 


18.41 


4.87 


4.45 


4.07 


3.72 


reactome 


111.75 


15.74 


1.12 


9.66 


1.81 


12.10 


3.01 


3.10 


2.87 


2.18 


vchocyc 


107.65 


44.89 


1.04 


6.50 


1.45 


7.86 


2.58 


3.75 


3.81 


2.09 


xmark 


134.72 


91.72 


1.42 


14.55 


1.93 


35.77 


4.89 


5.94 


6.52 


3.79 



Table 2: Query Time (ms) of 100A' Equal Queries on Small Real Datasets 



Dataset 


GRAIL 


GRAIL* 


PATH-TREE 


PATH-TREE* 


K-REACH 


PWAH-8 


INTERVAL 


2HI il ' 


HL 


DL 


agrocyc 


29.04 


3.64 


1.40 


4.47 


1.17 


2.58 


2.20 


4.22 


4.45 


3.40 


amaze 


501.99 


12.75 


1.83 


9.72 


2.31 


3.67 


4.41 


3.96 


4.12 


3.77 


anthra 


29.57 


3.49 


1.39 


4.36 


1.22 


1.30 


2.17 


4.15 


4.37 


3.37 


ecoo 


30.37 


3.67 


1.32 


4.45 


1.45 


2.52 


2.19 


4.15 


4.46 


3.41 


hpycyc 


28.82 


3.17 


1.31 


4.21 


2.00 


3.07 


2.33 


3.81 


3.95 


3.62 


human 


33.61 


3.97 


1.48 


5.77 


1.22 


1.46 


1.41 


2.67 


4.81 


3.85 


kegg 


616.65 


3.60 


1.88 


4.41 


2.58 


4.52 


4.63 


4.11 


4.35 


3.98 


mtbrv 


28.90 


8.29 


1.33 


8.08 


1.11 


1.37 


2.17 


3.90 


4.07 


3.34 


nasa 


28.20 


5.03 


1.66 


4.99 


2.74 


7.99 


5.33 


4.82 


5.26 


5.40 


reactome 


32.92 


3.52 


2.97 


4.37 


3.00 


7.86 


3.40 


3.52 


3.37 


3.70 


vchocyc 


29.62 


8.85 


1.36 


6.54 


1.42 


2.61 


2.24 


4.04 


3.79 


3.38 


xmark 


63.73 


17.16 


1.71 


10.88 


2.27 


11.16 


4.47 


5.48 


5.36 


5.44 



Table 3: Query Time (ms) of WOK Random Queries on Small Real Datasets 



Dataset 


GRAIL 


GRAIL* 


PATH-TREE 


PATH-TREE* 


K-REACH 


PWAH-8 


INTERVAL 


2HOP 


HL 


DL 


agrocyc 


22.62 


27.20 


128.10 


68.91 


284.50 


5.01 


3.72 


245.60 


120.75 


12.62 


amaze 


7.35 


10.38 


357.40 


27.84 


330.37 


4.47 


3.21 


2672.16 


43.16 


4.14 


anthra 


14.11 


24.97 


88.20 


64.16 


246.33 


4.14 


2.90 


241.05 


89.44 


12.43 


ecoo 


12.76 


26.01 


94.54 


66.88 


282.14 


4.98 


3.67 


254.64 


92.16 


12.49 


hpycyc 


4.75 


12.96 


39.02 


23.25 


223.63 


2.74 


1.84 


199.20 


41.48 


5.24 


human 


71.24 


72.29 


298.24 


143.08 


296.47 


5.30 


4.12 


417.55 


155.19 


37.38 


kegg 


4.08 


26.38 


435.99 


44.85 


411.80 


5.56 


2.24 


2877.98 


48.33 


2.35 


mtbrv 


9.38 


17.82 


71.72 


46.98 


249.31 


2.19 


2.99 


208.31 


115.31 


9.80 


nasa 


10.18 


7.72 


49.37 


26.58 


1637.47 


9.83 


6.06 


835.88 


143.46 


8.88 


reactome 


1.21 


20.17 


8.25 


26.43 


35.15 


1.23 


0.78 


80.68 


25.38 


1.02 


vchocyc 


9.32 


16.67 


70.29 


45.86 


260.10 


4.39 


3.22 


224.31 


65.30 


9.53 


xmark 


11.07 


30.24 


109.30 


50.49 


806.25 


10.43 


5.89 


1557.01 


53.23 


8.70 



Table 4: Construction Time (ms) of Small Real Datasets 



In the following, we report the experimental results on small 
graphs first and then on large graphs. These graphs have been 
widely used for studying reachability computation 1301 [8] [17] [T6J 
|33]|n]ll[29][l5][l(3[l9). In Tabled the first three columns give the 
names, number of vertices and number of edges for the coalesced 
DAGs derived from each original graph. The last three columns 
give similar information for large real graphs. 
Small Graphs: Table [2] reports the query times of the reacha- 
bility oracle approaches (2HOP, Hierarchical-Labeling (HL), and 
Distribution-Labeling (DL)) against the state-of-the-art transitive 
closure compression approaches (PWAH-8, INTERVAL, PATH- 
TREE, K-REACH), and online search (GRAIL), as well as some of 
their SCARAB counterparts, including GRAIL* and PATH-TREE* 
using the equal query load. Table|3]reports the query time using the 
random query load. 

We make the following important observations on the query time: 
1) On small graphs, PATH-TREE outperforms other methods, though 
K-REACH is fairly close (as it is quite similar to the transitive clo- 
sure materialization). Interestingly, the reachability oracle methods 
turn out to be quite comparable. In particular, the Distribution- 



Labeling (DL) is consistently about 2 times slower than PATH- 
TREE, and even faster than the other transitive closure compres- 
sion approaches, INTERVAL and PWAH-8, on equal query load. 

2) Compared to the existing set-cover based labeling approach 2HOP, 
Hierarchical-Labeling (HL) is quite comparable (slightly slower), 
but the query time of Distribution-Labeling (DL) is only 2/3 of that 
of2HOP. 

3) The reachability oracle approaches are slightly slower on the 
random query load than on the equal query load. This is because to 
determine vertex u cannot reach vertex v, the query processing has 
to completely scan L ou t(u) and Li„(v). 

Table|4]shows the construction time of different reachability in- 
dices on small graphs. We observe K-REACH and 2HOP are the 
slowest. This is understandable as K-REACH needs to perform 
vertex-cover discovery and materialize the transitive closure for the 
vertex-cover; and 2HOP needs to perform the expensive greedy set- 
cover and completely materialize the transitive closure. INTER- 
VAL and PAWH-8 turn out be the fastest and even faster than the 
online search GRAIL approach as the later still needs to perform 
random DFS a few times (in this study, we choose the number 
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Figure 3: Index Size on Small Real Graphs (in terms of the number of Figure 4: Index Size on Large Real Graphs (in terms of the number of 
integers used in the indices) integers used in the indices) 



Dataset 


GRAIL 


GRAIL* 


PATH-TREE 


PATH-TREE* 


K-REACH 


PWAH-8 


INTERVAL 


2HOP 


HL 


DL 


citeseer 


63.40 


18.60 


4.93 


18.40 


9.29 


20.63 


12.28 


4.51 


7.71 


3.70 


go.uniprot 


77.61 


23.26 




20.37 




41.93 


17.02 


16.04 


6.21 


12.82 


mapped. 100K 


253.82 


32.98 


6.72 


16.05 




90.61 


6.04 


5.06 


5.15 


6.26 


mapped. 1M 


762.24 


30.31 


8.40 


16.87 




46.59 


6.31 


5.59 


6.07 


7.28 


uniprotenc_22m 


52.99 


9.26 


6.19 


14.70 




23.13 


15.50 


47598.40 


4.42 


5.07 


uniprotenc.lOOm 


82.69 


15.43 




22.92 




29.82 


19.83 




5.67 


5.40 


uniprotenc_150m 


79.15 


18.11 




30.18 




31.06 


20.35 




6.52 


5.80 


citeseerx 


2012.33 


2358.31 








76.33 


8.76 




210.19 


5.99 


cit-Patents 


403.91 


141.02 








2538.92 








35.01 



Table 5: Query Time (ms) of 100A' Equal Queries on Large Real Datasets 



Dataset 


GRAIL 


GRAIL* 


PATH-TREE 


PATH-TREE* 


K-REACH 


PWAH-8 


INTERVAL 


2HOP 


HL 


DL 


citeseer 


40.16 


6.14 


4.37 


8.40 


5.21 


12.39 


9.64 


6.97 


4.70 


4.71 


go.uniprot 


47.63 


9.03 




13.57 




52.52 


20.76 


13.02 


12.02 


11.64 


mapped. 100K 


52.40 


4.99 


5.88 


7.37 




4.94 


4.99 


6.47 


6.68 


8.81 


mapped. 1M 


55.00 


5.33 


8.75 


9.49 




5.60 


6.73 


7.08 


9.78 


9.33 


uniprotenc_22m 


40.54 


8.65 


9.11 


12.50 




21.87 


15.19 


4.43 


5.88 


7.04 


uniprotenc. 1 00m 


53.01 


10.35 




17.55 




28.30 


20.07 




7.47 


7.40 


uniprotenc_150m 


56.63 


10.66 




18.83 




29.14 


23.12 




10.65 


7.86 


citeseerx 


2585.63 


94.86 








39.82 


13.39 




23.70 


8.78 


cit-Patents 


501.53 


110.13 








1766.25 








20.49 



Table 6: Query Time (ms) of 100A' Random Queries on Large Real Datasets 



Dataset 


GRAIL 


GRAIL* 


PATH-TREE 


PATH-TREE* 


K-REACH 


PWAH-8 


INTERVAL 


2HOP 


HL 


DL 


citeseer 


2,011 


1,250 


18,025 


2,192 


187,182 


487 


307 


14,054 


2,232 


528 


go.uniprot 


32,358 


23,454 




5,058,641 




34,373 


20,664 


252,540 


279,132 


16,706 


mapped. 100K 


6,220 


5,362 


26,667 


9,775 




448 


419 


9,760 


10,141 


1,902 


mapped. 1M 


28,303 


22,147 


103,265 


42,475 




2,399 


3,777 


52,190 


45,490 


6,894 


uniprotenc_22m 


5,034 


2,830 


9,801,660 


43,734 




1,408 


1,064 


102,679 


5,209 


1,004 


uniprotenc. 1 00m 


66,285 


43,050 




8,206,130 




16,330 


11,624 




67,270 


13,854 


uniprotenc_150m 


101,556 


69,032 




18,900,437 




27,202 


18,863 




119,570 


21,015 


citeseerx 


17,564 


21,206 








17,006 


7,015 




182,068 


9,909 


cit-Patents 


15,669 


42,175 








935,457 








114,583 



Table 7: Construction Time (ms) of Large Real Datasets 



to be 5 as being used in 1311 ). Both Hierarchical-Labeling (HL) 
and Distribution-Labeling (DL) are much more efficient in label- 
ing: The Hierarchical-Labeling is on average 5 times faster than 
2HOP whereas the Distribution-Labeling is consistently 20 times 
faster (and in some case more than two order of magnitude faster) 
than 2HOP. In fact, it has even faster construction time than GRAIL 
and quite comparable to the INTERVAL and PWAH-8. 

Figure [3] shows the index size of different reachability index 
methods along with some of their SCARAB counterparts on small 
graphs. Here, PWAH-8 and INTERVAL outperform the others 
on index size. It is interestingly to observe that the labeling size 
of Hierarchical-Labeling (HL) is quite comparable to 2HOP (and 
this is also consistent with the query time). More importantly and 



rather surprisingly, the labeling size of Distribution-Labeling (DL) 
is consistently smaller than that of 2HOP, the set-cover based opti- 
mization labeling targeting for minimizing the labeling size. This, 
we believe, can be attributed to the effectiveness of the total order 
based hierarchy and the non-redundant labeling process. 
Large Graphs: Large graphs provide the real challenge for the 
reachability computation. We observe that only three methods, 
GRAIL, PWAH-8, and Distribution-Labeling are able to handle 
all these graphs (GRAIL* is the SCARAB variant for speeding 
up query performance). Distribution-Labeling and INTERVAL can 
work on 8, and PATH-TREE* can work on 7, out of 9 large graphs. 
K-REACH can only perform one graph, where PATH-TREE and 
2HOP fail on 5 and 4 large graphs, respectively. 



Tables [5] and [6] report the query time using the equal and ran- 
dom query load, respectively. We make the following observations: 
1) On large graphs, the transitive closure compression approaches, 
even on the graphs they can work, become significant slower. This 
is expected as the compressed transitive closure TC(v) becomes 
larger, its search (linear or binary) becomes more expensive. Now, 
the advantage of the reachability oracle becomes clear as they be- 
come the fastest in terms of query time (even faster than PATH- 
TREE and INTERVAL, and consistently more than 5 times faster 
than PWAH-8). 2) Compared with the original 2HOP labeling, both 
Hierarchical-Labeling and Distribution-Labeling have comparable 
query performance on the graphs which they all can run. 

Tables [7] shows the construction time on large graphs for all 
methods. We observe that PAWH-8 and INTERVAL are very fast 
though as the graph becomes larger, they become slower or can- 
not finish. Distribution-Labeling turns out to be quite comparable 
(fastest on several graphs). Hierarchical-Labeling can work on 8 
out of 9 graphs and it shows signifiant improvement on 2 out of 5 
graphs which 2HOP can also process. Distribution-Labeling is on 
average of one order of magnitude performance faster than 2HOP 
on these five graphs. 

Figure [4] shows the index size of different approaches. The re- 
sults are quite consistent with the results on the small graphs on 
those graphs they can work. For most cases, PWAH-8 And INTER- 
VAL have the smallest index size. 2HOP, Hierarchical-Labeling 
and Distribution-Labeling also perform well (better than GRAIL 
and K-Reach). The labeling sizes of 2HOP, Hierarchical-Labeling 
and Distribution-Labeling are quite comparable; Distribution-Labeling 
has smaller labeling size than Hierarchical Labeling and very close 
to (or better than) 2HOP on the graphs it can run. 

7. CONCLUSION 

In this paper, by introducing two simple, elegant, and effective 
labeling approaches, Hierarchical Labeling and Distribution La- 
beling, we are able to resolve an important open question in reach- 
ability computation: the reachability oracle can be a powerful tool 
(or even the most useful one) to handle real large graphs. Our 
experimental results demonstrate that they can perform on graphs 
with millions of vertices/edges (scalable), are quickest in answering 
reachability queries on large graphs (fast), and have comparable or 
better labeling size as the set-cover based optimization approaches 
(compact). In the future, we will investigate the labeling on dy- 
namic graphs and how to apply them on more general reachability 
computation, such as fc-reach problem. 
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